Symmetrization and Rademacher Complexity
ثبت نشده
چکیده
Let’s first see how primitive covers were inadequate. Recall that a function class G is a primitive cover for a function class F at scale > 0 over some set S if: • G ⊆ F , • |G| <∞, and • for every f ∈ F there exists g ∈ G with supx∈S |g(x)− f(x)| ≤ . Last class, we gave a generalization bound for classes with primitive covers (basically, primitive covers give discretizations, and then we apply finite class generalization). Problems with primitive covers. It’s pretty easy to run into limits of this technique. • Consider linear predictors as before, but the points x ∈ R are from some unbounded distribution, for instance a Gaussian. This immediately breaks the earlier construction. One fix is to truncate the distribution: since Gaussians concentrate well, we can find an X so that ‖x‖2 ≤ X with probability at least 1− δ (and this X does not depend too badly on n: recall from homework 1 the analysis of the maximum of a collection of scalar Gaussian random variables). So now we can first condition away an event of probability at most δ that some points have ‖x‖2 > X, and then run the cover argument as before. • Consider discontinuous function classes, for instance w 7→ sgn(〈w, x〉). If < 2, for any linear classifier f there must exist gf that exactly agrees with f on every point (i.e., any < 2 may as well be = 0). Since for any x 6 = 0 and w 6= 0, sgn(〈w, x〉) 6= sgn(〈−w, x〉), it follows that the primitive covering number is again infinite (e.g., for any w 6= 0, the only vectors within < 2 for this metric is the set {cw : c ∈ R \ {0}}, so the cover must include one vector for each direction, as well as 0). There are a number of ways to fix this (including giving non-primitive covers); we will come back to it after discussing Rademacher complexity. There is a better notion of cover that fixes these, but we’ll get there through Rademacher complexity.
منابع مشابه
Permutational Rademacher Complexity - A New Complexity Measure for Transductive Learning
Abstract. Transductive learning considers situations when a learner observes m labelled training points and u unlabelled test points with the final goal of giving correct answers for the test points. This paper introduces a new complexity measure for transductive learning called Permutational Rademacher Complexity (PRC) and studies its properties. A novel symmetrization inequality is proved, wh...
متن کاملOn the Importance of Small Coordinate Projections
It has been recently shown that sharp generalization bounds can be obtained when the function class from which the algorithm chooses its hypotheses is “small” in the sense that the Rademacher averages of this function class are small. We show that a new more general principle guarantees good generalization bounds. The new principle requires that random coordinate projections of the function cla...
متن کاملOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
We study an equivalence of (i) deterministic pathwise statements appearing in the online learning literature (termed regret bounds), (ii) high-probability tail bounds for the supremum of a collection of martingales (of a specific form arising from uniform laws of large numbers for martingales), and (iii) in-expectation bounds for the supremum. By virtue of the equivalence, we prove exponential ...
متن کاملOnline Learning: Stochastic, Constrained, and Smoothed Adversaries
Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. from a fixed distribution, and the adversarial scenario wherein, at every time step, an adversarially chosen instance is revealed to the player. It can be argued that in the real world neither of these assumptions is reasonable. We define the minimax value of a ...
متن کاملHuman Rademacher Complexity
We propose to use Rademacher complexity, originally developed in computational learning theory, as a measure of human learning capacity. Rademacher complexity measures a learner’s ability to fit random labels, and can be used to bound the learner’s true error based on the observed training sample error. We first review the definition of Rademacher complexity and its generalization bound. We the...
متن کامل